{"title":"将无监督风险分层扩展到大规模临床数据集","authors":"Z. Syed, I. Rubinfeld","doi":"10.4018/jkdb.2011010103","DOIUrl":null,"url":null,"abstract":"While rare clinical events, by definition, occur infrequently in a population, the consequences of these events can be drastic. Unfortunately, developing risk stratification algorithms for these conditions requires large volumes of data to capture enough positive and negative cases. This process is slow, expensive, and burdensome to both patients and caregivers. This paper proposes an unsupervised machine learning approach to address this challenge and risk stratify patients for adverse outcomes without use of a priori knowledge or labeled training data. The key idea of the approach is to identify high-risk patients as anomalies in a population. Cases are identified through a novel algorithm that finds an approximate solution to the k-nearest neighbor problem using locality sensitive hashing (LSH) based on p-stable distributions. The algorithm is optimized to use multiple LSH searches, each with a geometrically increasing radius, to find the k-nearest neighbors of patients in a dynamically changing dataset where patients are being added or removed over time. When evaluated on data from the National Surgical Quality Improvement Program (NSQIP), this approach successfully identifies patients at an elevated risk of mortality and rare morbidities. The LSH-based algorithm provided a substantial improvement over an exact k-nearest neighbor algorithm in runtime, while achieving a similar accuracy.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Scaling Unsupervised Risk Stratification to Massive Clinical Datasets\",\"authors\":\"Z. Syed, I. Rubinfeld\",\"doi\":\"10.4018/jkdb.2011010103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While rare clinical events, by definition, occur infrequently in a population, the consequences of these events can be drastic. Unfortunately, developing risk stratification algorithms for these conditions requires large volumes of data to capture enough positive and negative cases. This process is slow, expensive, and burdensome to both patients and caregivers. This paper proposes an unsupervised machine learning approach to address this challenge and risk stratify patients for adverse outcomes without use of a priori knowledge or labeled training data. The key idea of the approach is to identify high-risk patients as anomalies in a population. Cases are identified through a novel algorithm that finds an approximate solution to the k-nearest neighbor problem using locality sensitive hashing (LSH) based on p-stable distributions. The algorithm is optimized to use multiple LSH searches, each with a geometrically increasing radius, to find the k-nearest neighbors of patients in a dynamically changing dataset where patients are being added or removed over time. When evaluated on data from the National Surgical Quality Improvement Program (NSQIP), this approach successfully identifies patients at an elevated risk of mortality and rare morbidities. The LSH-based algorithm provided a substantial improvement over an exact k-nearest neighbor algorithm in runtime, while achieving a similar accuracy.\",\"PeriodicalId\":160270,\"journal\":{\"name\":\"Int. J. Knowl. Discov. Bioinform.\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Knowl. Discov. Bioinform.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/jkdb.2011010103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Discov. Bioinform.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/jkdb.2011010103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scaling Unsupervised Risk Stratification to Massive Clinical Datasets
While rare clinical events, by definition, occur infrequently in a population, the consequences of these events can be drastic. Unfortunately, developing risk stratification algorithms for these conditions requires large volumes of data to capture enough positive and negative cases. This process is slow, expensive, and burdensome to both patients and caregivers. This paper proposes an unsupervised machine learning approach to address this challenge and risk stratify patients for adverse outcomes without use of a priori knowledge or labeled training data. The key idea of the approach is to identify high-risk patients as anomalies in a population. Cases are identified through a novel algorithm that finds an approximate solution to the k-nearest neighbor problem using locality sensitive hashing (LSH) based on p-stable distributions. The algorithm is optimized to use multiple LSH searches, each with a geometrically increasing radius, to find the k-nearest neighbors of patients in a dynamically changing dataset where patients are being added or removed over time. When evaluated on data from the National Surgical Quality Improvement Program (NSQIP), this approach successfully identifies patients at an elevated risk of mortality and rare morbidities. The LSH-based algorithm provided a substantial improvement over an exact k-nearest neighbor algorithm in runtime, while achieving a similar accuracy.