Instance Weighting for Patient-Specific Risk Stratification Models

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI:10.1145/2783258.2783397

Jen J. Gong, T. Sundt, J. Rawn, J. Guttag

{"title":"Instance Weighting for Patient-Specific Risk Stratification Models","authors":"Jen J. Gong, T. Sundt, J. Rawn, J. Guttag","doi":"10.1145/2783258.2783397","DOIUrl":null,"url":null,"abstract":"Accurate risk models for adverse outcomes can provide important input to clinical decision-making. Surprisingly, one of the main challenges when using machine learning to build clinically useful risk models is the small amount of data available. Risk models need to be developed for specific patient populations, specific institutions, specific procedures, and specific outcomes. With each exclusion criterion, the amount of relevant training data decreases, until there is often an insufficient amount to learn an accurate model. This difficulty is compounded by the large class imbalance that is often present in medical applications. In this paper, we present an approach to address the problem of small data using transfer learning methods in the context of developing risk models for cardiac surgeries. We explore ways to build surgery-specific and hospital-specific models (the target task) using information from other kinds of surgeries and other hospitals (source tasks). We propose a novel method to weight examples based on their similarity to the target task training examples to take advantage of the useful examples while discounting less relevant ones. We show that incorporating appropriate source data in training can lead to improved performance over using only target task training data, and that our method of instance weighting can lead to further improvements. Applied to a surgical risk stratification task, our method, which used data from two institutions, performed comparably to the risk model published by the Society for Thoracic Surgeons, which was developed and tested on over one hundred thousand surgeries from hundreds of institutions.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"275 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2783258.2783397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Accurate risk models for adverse outcomes can provide important input to clinical decision-making. Surprisingly, one of the main challenges when using machine learning to build clinically useful risk models is the small amount of data available. Risk models need to be developed for specific patient populations, specific institutions, specific procedures, and specific outcomes. With each exclusion criterion, the amount of relevant training data decreases, until there is often an insufficient amount to learn an accurate model. This difficulty is compounded by the large class imbalance that is often present in medical applications. In this paper, we present an approach to address the problem of small data using transfer learning methods in the context of developing risk models for cardiac surgeries. We explore ways to build surgery-specific and hospital-specific models (the target task) using information from other kinds of surgeries and other hospitals (source tasks). We propose a novel method to weight examples based on their similarity to the target task training examples to take advantage of the useful examples while discounting less relevant ones. We show that incorporating appropriate source data in training can lead to improved performance over using only target task training data, and that our method of instance weighting can lead to further improvements. Applied to a surgical risk stratification task, our method, which used data from two institutions, performed comparably to the risk model published by the Society for Thoracic Surgeons, which was developed and tested on over one hundred thousand surgeries from hundreds of institutions.

查看原文本刊更多论文

特定患者风险分层模型的实例加权

准确的不良后果风险模型可以为临床决策提供重要的输入。令人惊讶的是，当使用机器学习来构建临床有用的风险模型时，主要的挑战之一是可用的数据量很少。需要针对特定患者群体、特定机构、特定程序和特定结果开发风险模型。随着每一个排除标准的增加，相关训练数据的数量就会减少，直到往往没有足够的数据来学习一个准确的模型。这一困难由于在医学应用中经常出现的大类别不平衡而复杂化。在本文中，我们提出了一种在开发心脏手术风险模型的背景下使用迁移学习方法来解决小数据问题的方法。我们探索使用来自其他类型的手术和其他医院(源任务)的信息来构建特定于手术和医院的模型(目标任务)的方法。我们提出了一种基于与目标任务训练样例的相似度来加权样例的新方法，以利用有用的样例而忽略不相关的样例。我们表明，与只使用目标任务训练数据相比，在训练中纳入适当的源数据可以提高性能，并且我们的实例加权方法可以进一步改进。应用于手术风险分层任务，我们的方法使用了来自两个机构的数据，与胸外科医生协会发布的风险模型相比，该模型是在数百家机构的10万多例手术中开发和测试的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

自引率

0.00%

发文量