{"title":"Automating assignment of HIV+ patients into phenogroups from demography bound phenotype attack rates.","authors":"Nick Williams","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Evidence based medicine and health data for policy should update statistical data modeling methods to take advantage of at-scale data. One challenge with at-scale data is information segmentation for clinical presentation discovery and cohort assignment. We use gradient boosting machine (GBM) to segment 94,379,175,015 diagnostic clinical events attributable to 283,632,789 Centers for Medicare and Medicaid Services beneficiaries over 22 observation years. Diagnostic events were aggregated into attack rates by demography and Phenome-wide association studies (PheWas) codes. Resulting attack rates were then segmented within a user defined clinical status of interest, in this case HIV status. 1,753,647 HIV+ beneficiaries were considered. The GBM model assigned 19,651,408 PheWas attack rates from 69,133,296 ICD attack rates into phenogroups/nodes.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2024 ","pages":"1235-1244"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12099429/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Evidence based medicine and health data for policy should update statistical data modeling methods to take advantage of at-scale data. One challenge with at-scale data is information segmentation for clinical presentation discovery and cohort assignment. We use gradient boosting machine (GBM) to segment 94,379,175,015 diagnostic clinical events attributable to 283,632,789 Centers for Medicare and Medicaid Services beneficiaries over 22 observation years. Diagnostic events were aggregated into attack rates by demography and Phenome-wide association studies (PheWas) codes. Resulting attack rates were then segmented within a user defined clinical status of interest, in this case HIV status. 1,753,647 HIV+ beneficiaries were considered. The GBM model assigned 19,651,408 PheWas attack rates from 69,133,296 ICD attack rates into phenogroups/nodes.