Data Science Techniques to Improve Accuracy of Provider Network Directory

2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW) Pub Date : 2018-12-01 DOI:10.1109/HIPCW.2018.8634423

Priya Kandasamy, Divya Raji, Arun Sundararaman

{"title":"Data Science Techniques to Improve Accuracy of Provider Network Directory","authors":"Priya Kandasamy, Divya Raji, Arun Sundararaman","doi":"10.1109/HIPCW.2018.8634423","DOIUrl":null,"url":null,"abstract":"Trivial or tactical as it may appear, yet, Provider data inaccuracy continues to be a major challenge in healthcare industry. With about 250 key attributes per provider and roughly 500K providers in USA, this translates to maintaining current and correct values for a whopping 12.5 M attributes dataset that is very dynamic and volatile. Inaccuracy in this dataset implies 2 major adverse consequences; a) Regulatory penalties ranging from few thousand dollars to few million dollars and b) potential member attrition due to member dissatisfaction, triggered by increased waiting time, delay in accessing the medical service, efforts wasted on reaching out to incorrect provider etc. Many of the current solutions carry limitations such as lack of centralized storage, data latency issues and non-standardized questionnaire to capture provider update etc. This paper introduces an innovative approach that addresses these limitations using Predictive Analytics and Intake Scoring techniques. Rooted in Data Science, the proposed ensemble model combines the advantages of individual prediction models such as Logistic Regression, Random Forest, Neural Network and XgBoost. This automated approach also brings down the dependency on external systems and automatically updates the database, keeping it up to date. A detailed analysis of results from work carried out using this innovative approach are discussed at length and the paper concludes with directions for future work.)","PeriodicalId":401060,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIPCW.2018.8634423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Trivial or tactical as it may appear, yet, Provider data inaccuracy continues to be a major challenge in healthcare industry. With about 250 key attributes per provider and roughly 500K providers in USA, this translates to maintaining current and correct values for a whopping 12.5 M attributes dataset that is very dynamic and volatile. Inaccuracy in this dataset implies 2 major adverse consequences; a) Regulatory penalties ranging from few thousand dollars to few million dollars and b) potential member attrition due to member dissatisfaction, triggered by increased waiting time, delay in accessing the medical service, efforts wasted on reaching out to incorrect provider etc. Many of the current solutions carry limitations such as lack of centralized storage, data latency issues and non-standardized questionnaire to capture provider update etc. This paper introduces an innovative approach that addresses these limitations using Predictive Analytics and Intake Scoring techniques. Rooted in Data Science, the proposed ensemble model combines the advantages of individual prediction models such as Logistic Regression, Random Forest, Neural Network and XgBoost. This automated approach also brings down the dependency on external systems and automatically updates the database, keeping it up to date. A detailed analysis of results from work carried out using this innovative approach are discussed at length and the paper concludes with directions for future work.)

查看原文本刊更多论文

提高供应商网络目录准确性的数据科学技术

提供者数据不准确虽然看起来微不足道，但仍然是医疗保健行业的主要挑战。每个提供者大约有250个关键属性，在美国大约有500K个提供者，这意味着要为一个非常动态和不稳定的12.5 M属性数据集维护当前和正确的值。该数据集的不准确性意味着两个主要的不利后果;a)监管处罚从几千美元到几百万美元不等;b)由于等待时间增加、获得医疗服务的延迟、与错误的提供者联系所浪费的努力等，导致会员不满，从而可能导致会员流失。当前的许多解决方案都存在局限性，例如缺乏集中存储、数据延迟问题以及捕获提供商更新的非标准化问卷等。本文介绍了一种利用预测分析和摄入评分技术解决这些限制的创新方法。该集成模型植根于数据科学，结合了Logistic回归、随机森林、神经网络和XgBoost等个体预测模型的优点。这种自动化方法还降低了对外部系统的依赖，并自动更新数据库，使其保持最新状态。详细分析了使用这种创新方法开展的工作的结果，并详细讨论了未来工作的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW)

自引率

0.00%

发文量