{"title":"Using Machine Learning to Identify Geographic and Socioeconomic Disparities in Dialysis Facility Outcomes Across the United States.","authors":"Ziad M Ashkar, Raju Gottumukkala","doi":"10.31486/toj.25.0040","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite progress in dialysis care, the patient outcomes of mortality, hospitalization, and readmission rates remain unsatisfactory because of complex clinical, demographic, and socioeconomic interactions. For this study, we used unsupervised machine learning to identify clusters of dialysis facilities based on quality metrics and sociodemographic factors, with attention to racial and geographic disparities.</p><p><strong>Methods: </strong>We sourced facility-level data from data.cms.gov and sourced ZIP Code Tabulation Area-level sociodemographic data from the 2021 American Community Survey via the US Census Bureau application programming interface. Datasets were linked by ZIP code, standardized, and analyzed using principal component analysis and k-means clustering. We examined geographic patterns by US Census Bureau regions. Analyses were conducted in Python version 3.11.6 (Python Software Foundation) with the following libraries: pandas for data manipulation, scikit-learn for machine learning and principal component analysis, Matplotlib and Seaborn for data visualization, and GeoPandas for geographic mapping and spatial analysis.</p><p><strong>Results: </strong>Two facility clusters emerged: Cluster 0 (n=4,609) and Cluster 1 (n=2,857). Cluster 1 was characterized by poorer outcomes (higher mortality, hospitalization, readmission, anemia, catheter use, and hyperphosphatemia); lower rates of fistula use; and lower dialysis adequacy compared to Cluster 0. Cluster 1 facilities were more prevalent in regions with lower income, higher unemployment, and lower college education, and they served populations with greater proportions of Black and Hispanic residents. Geographically, Cluster 1 facilities were concentrated in the southern and western United States. Compared to Cluster 0, a larger share of Cluster 1 facilities were for-profit facilities (91.4% vs 88.5%).</p><p><strong>Conclusion: </strong>This study highlights a distinct cluster of underperforming dialysis clinics serving socioeconomically disadvantaged and racially diverse populations. Addressing these disparities requires multifaceted strategies including patient-level, institutional, and policy-level interventions.</p>","PeriodicalId":47600,"journal":{"name":"Ochsner Journal","volume":"25 3","pages":"170-180"},"PeriodicalIF":1.2000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456289/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ochsner Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31486/toj.25.0040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Despite progress in dialysis care, the patient outcomes of mortality, hospitalization, and readmission rates remain unsatisfactory because of complex clinical, demographic, and socioeconomic interactions. For this study, we used unsupervised machine learning to identify clusters of dialysis facilities based on quality metrics and sociodemographic factors, with attention to racial and geographic disparities.
Methods: We sourced facility-level data from data.cms.gov and sourced ZIP Code Tabulation Area-level sociodemographic data from the 2021 American Community Survey via the US Census Bureau application programming interface. Datasets were linked by ZIP code, standardized, and analyzed using principal component analysis and k-means clustering. We examined geographic patterns by US Census Bureau regions. Analyses were conducted in Python version 3.11.6 (Python Software Foundation) with the following libraries: pandas for data manipulation, scikit-learn for machine learning and principal component analysis, Matplotlib and Seaborn for data visualization, and GeoPandas for geographic mapping and spatial analysis.
Results: Two facility clusters emerged: Cluster 0 (n=4,609) and Cluster 1 (n=2,857). Cluster 1 was characterized by poorer outcomes (higher mortality, hospitalization, readmission, anemia, catheter use, and hyperphosphatemia); lower rates of fistula use; and lower dialysis adequacy compared to Cluster 0. Cluster 1 facilities were more prevalent in regions with lower income, higher unemployment, and lower college education, and they served populations with greater proportions of Black and Hispanic residents. Geographically, Cluster 1 facilities were concentrated in the southern and western United States. Compared to Cluster 0, a larger share of Cluster 1 facilities were for-profit facilities (91.4% vs 88.5%).
Conclusion: This study highlights a distinct cluster of underperforming dialysis clinics serving socioeconomically disadvantaged and racially diverse populations. Addressing these disparities requires multifaceted strategies including patient-level, institutional, and policy-level interventions.
期刊介绍:
The Ochsner Journal is a quarterly publication designed to support Ochsner"s mission to improve the health of our community through a commitment to innovation in healthcare, medical research, and education. The Ochsner Journal provides an active dialogue on practice standards in today"s changing healthcare environment. Emphasis will be given to topics of great societal and medical significance.